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Abstract 



We study theoretical runtime guarantees for a class of optimization problems that occur in a wide va- 
riety of inference problems. These problems are motivated by the lasso framework and have applications 
in machine learning and computer vision. 

Our work shows a close connection between these problems and core questions in algorithmic graph 
theory. While this connection demonstrates the difficulties of obtaining runtime guarantees, it also 
suggests an approach of using techniques originally developed for graph algorithms. 

We then show that most of these problems can be formulated as a grouped least squares problem, 
and give efficient algorithms for this formulation. Our algorithms rely on routines for solving quadratic 
minimization problems, which in turn are equivalent to solving linear systems. Finally we present some 
experimental results on applying our approximation algorithm to image processing problems. 
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1 Introduction 

The problem of recovering a discrete, clear signal from noisy data is an important problem in signal 
processing. One general approach to this problem is to formulate an objective based on required properties 
of the answer, and then return its minimizer via optimization algorithms. The power of this method was first 
demonstrated in image denoising, where the total variation minimization approach by Rudin, Osher and 
Fatemi [ROF92] had much success. More recent works on sparse recovery led to the theory of compressed 
sensing [Can06], which includes approaches such as the least absolute shrinkage and selection operator 
(LASSO) objective due to Tibshirani [Tib96]. These objective functions have proven to be immensely 
powerful tools, applicable to problems in signal processing, statistics, and computer vision. In the most 
general form, given vector y and a matrix A, one seeks to minimize: 

min||y — Axjll (1-1) 
subject to: |x|i < c 

It can be shown to be equivalent to the following by introducing a Lagrangian multiplier, A: 

min||y — Ax||| + A|x|i (1.2) 

Many of the algorithms used to minimize the LASSO objective in practice are first order methods 
[NES07, BCG11], which updates a sequence of solutions using well-defined vectors related to the gradient. 
These methods are guaranteed to converge well when the matrix A is "well-structured". The formal 
definition of this well-structuredness is closely related to the conditions required by the guarantees given 
in the compressed sensing literature [Can06] for the recovery of a sparse signal. As a result, these methods 
perform very well on problems where theoretical guarantees for solution quality are known. This good 
performance, combined with the simplicity of implementation, makes these algorithms the method of 
choice for most problems. 

However, LASSO type approaches have also been successfully applied to larger classes of problems. This 
has in turn led to the use of these algorithms on a much wider variety of problem instances. An important 
case is image denoising, where works on LASSO-type objectives predates the compressed sensing literature 
[ROF92]. The matrices involved here are based on the connectivity of the underlying pixel structure, which 
is often a \fn x \fn square mesh. Even in a unweighted setting, these matrices tend to be ill-conditioned. 
In addition, the emergence of non-local formulations that can connect arbitrary pairs of vertices in the 
graph also highlights the need to handle problems that are traditionally considered ill-conditioned. We 
show in Appendix A that the broadest definition of LASSO problems include well-studied problems from 
algorithmic graph theory: 

Fact 1.1 Both the s-t shortest path and s-t minimum cut problems in undirected graphs can be solved by 
minimizing a LASSO objective. 

Although linear time algorithms for unweighted shortest path are known, finding efficient parallel 
algorithms for this has been a long-standing open problem. The current state of the art parallel algorithm 
for finding 1 + e approximate solutions, due to Cohen [CohOO], is quite involved. Furthermore, as the 
reductions done in Lemma A.l are readily parallelizable, an efficient algorithm for LASSO minimization 
would also lead to an efficient parallel shortest path algorithm. This suggests that algorithms for minimizing 
LASSO objectives, where each iteration involve simple, parallelizable operations, are also difficult. Finding 
a minimum s-t cut with nearly-linear running time is also a long standing open question in algorithm design. 
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In fact, there are known hard instances where many algorithms do exhibit their worst case behavior [JM93]. 
The difficulty of these problems and the non- linear nature of the objective are two of the main challenges 
in obtaining fast run time guarantees for grouped least squares minimization. 

Previous run time guarantees for minimizing LASSO objectives rely on general convex optimization 
routines [BV04], which take at least 0(n 2 ) time. As the resolution of images are typically at least 256 x 256, 
this running time is prohibitive. As a result, when processing image streams or videos in real time, gradient 
descent or filtering based approaches are typically used due to time constraints, often at the cost of solution 
quality. The continuing increase in problem instance size, due to higher resolution of streaming videos, or 
3D medical images with billions of voxels, makes the study of faster algorithms an increasingly important 
question. 

While the connection between LASSO and graph problems gives us reasons to believe that the difficulty 
of graph problems also exists in minimizing LASSO objectives, it also suggests that techniques from 
algorithmic graph theory can be brought to bear. To this end, we draw upon recent developments in 
algorithms for maximum flow [CKM + 11] and minimum cost flow [DS08]. We show that relatively direct 
modifications of these algorithms allows us to solve a generalization of most LASSO objectives, which we 
term the grouped least squares problem. Our algorithm is similar to convex optimization algorithms 
in that each iteration of it solves a quadratic minimization problem, which is equivalent to solving a linear 
system. The speedup over previous algorithms come from the existence of much faster solvers for graph 
related linear systems [ST06], although our approaches are also applicable to situations involving other 
underlying quadratic minimization problems. 

The organization of this paper is as follows: In Section 2 we provide a unified optimization problem 
that encompasses LASSO, fused LASSO, and grouped LASSO. We then discuss known applications of the 
grouped least squares minimization in Section 3 and existing approaches in Section 4. In Section 5 we give 
two algorithms that use solving quadratic minimization problems as underlying routines: An approximate 
algorithm based on the maximum flow algorithm of Christiano et al. [CKM+11], and an almost-exact 
algorithm that rely on interior point algorithms. 

2 Background and Formulations 

The formulation of our main problem is motivated by the total variation objective from image denoising. 
This objective has its origin in the seminal work by Mumford and Shah [MS89]. There are two conflicting 
goals in recovering a smooth image from a noisy one, namely that it must be close to the original image, 
while having very little noise. The Mumford-Shah function models the second constraint by imposing 
penalties for neighboring pixels that differ significantly. These terms decrease with the removal of local 
distortions, offsetting the higher cost of moving further away from the input. However, the minimization 
of this functional is computationally difficult and subsequent works focused on minimizing functions that 
are close to it. 

The total variation objective is defined for a discrete, pixel representation of the image and measures 
noise using a smoothness term calculated from differences between neighboring pixels. This objective leads 
naturally to a graph G = (V, E) corresponding to the image with pixels. The original (noisy) image is 
given as a vertex labeling s, while the goal of the optimization problem is to recover the 'true' image x, 
which is another set of vertex labels. The requirement of x being close to s is quantified by ||x — s| || , 
specifically summing over the squares of what we identify as noise while the smoothness term is a sum over 
absolute values of difference between adjacent pixels' labels: 




(2.3) 



(u,v)eE 
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This objective can be viewed as an instance of the fused LASSO objective [TSR + 05]. As the orientation 
of the underlying pixel grid is artificially imposed by the camera, this method can introduce rotational bias 
in its output. One way to correct this bias is to group the differences of each pixel with its 4 neighbors, 
giving terms of the form: 



\/ (x u - x v ) 2 + (x u - x w ) 2 (2.4) 
where v and w are the horizontal and vertical neighbor of u. 

Our generalization of these objectives is based on the key observation that \J (x u — x v ) 2 + (x u — x w ) 2 
and I both L2 norms of a vector containing differences of values of adjacent pixels. Each such 

difference can be viewed as an edge in the underlying graph, and the grouping gives a natural partition of 
the edges into disjoint sets S\ . . . Sk- 



l|x-s||!+ E J E (*»-**) 2 (2-5) 
i<i<k y (u,v)eSi 

It can be checked that when each Si contain a single edge, this formulation is identical to the objective in 
Equation 2.3 since ^/ (x u — x v ) 2 = \x u — x v \. Then all the terms can be written as quadratic positive semi- 
definite terms involving x and Sj, where we now allow for different fixed values in the groups. Specifically 
when given symmetric positive semidefinite (PSD) matrices Lo . . . Lfc, the objective can be rewritten as: 

l|x-s ||!+ Yl ^/(x-Si) T Li(x-Si) (2.6) 

\<i<k 

If we use || • IIl* to denote the norm induced by the PSD matrix Lj, each of the terms can be written 
as I |x — Sj 1 1 To make the first term resemble the other terms in our objective, we will take a square root 
of it - as we prove in Appendix D, algorithms that give exact minimizers for this variant still captures the 
original version of the problem. This simplification allows us to define our main problem: 

Definition 2.1 The grouped least squares problem is: 

Input: n x n matrices L\ . . . and fixed potentials si . . . G . 
Output: 

min OBJ(x) = \\x-Si\\ L . 

X L ' 

l<i<fc 

Note that this objective allows for the usual definition of LASSO involving terms of the form |x u | by 
having one group for each such variable with s = 0. It is also related to group LASSO [YL06], which 
incorporates similar assumptions about closer dependencies among some of the terms. To our knowledge 
grouping has not been studied in conjunction with fused LASSO, although many problems such as the 
ones listed in Section 3 require this generalization. 

2.1 Quadratic Minimization and Solving Linear Systems 

Our algorithmic approach to the group least squares problem crucially depends on solving a related 
quadratic minimization problem. Specifically, we solve linear systems involving a weighted combination of 
the Lj matrices. Let wi . . . G 5i + denote weights, where Wj is the weight on the ith group. Then the 
quadratic minimization problem that we consider is: 
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mmOBJ2(x,w) = V — I |x — s.11? . 

l<i<fc 

We will use 0PT2(w) to denote the minimum value that is attainable. This minimizer, x, can be 
obtained using the following Lemma: 

Lemma 2.2 OBJ2{x, w) is minimized for x such that 




Therefore the quadratic minimization problem reduces to a linear system solve involving ^ i ^"Lj, or 
ctiLi where a is an arbitrary set of positive coefficients. In general, this can be done in 0(ro a; ) time where 
u is the matrix multiplication constant [Str69, CW90, VW12]. When Lj is symmetric diagonally dominant, 
which is the case for image applications and most graph problems, these systems can be approximately 
solved to e accuracy in 0(mlog(l/e)) time, where m is the total number of non-zero entries in the matrices 
[ST04, ST06, KMP10, KMP11], and also in Oim 1 / 3 ^ log(l/e)) parallel depth [BGK+11]. There has 
also been work on extending this type of approach to a wider class of systems [AST09], with works on 
systems arising from well-spaced finite-element meshes [BHV04], 2-D trusses [DS07], and certain types of 
quadratically coupled flows [KMP12]. For the analysis of our algorithms, we treat this step as a black box 
with running time T(n,m). Furthermore, to simplify our presentation we assume that the solves return 
exact answers, as errors can be brought to polynomially small values with an extra O(logn) overhead. We 
believe analyses similar to those performed in [CKM + 11, KMP12] can be adapted if we use approximate 
solves instead of exact ones. 

3 Applications 

A variety of problems ranging from computer vision to statistics can be formulated as grouped least squares. 
We describe some of them below, starting with classical problems from image processing. 

3.1 Total Variation Minimization 

As mentioned in Section 1, one of the earliest applications of these objectives was in the context of image 
processing. More commonly known as total variation minimization in this setting [CS05], various variants 
of the objective have been proposed with the anisotropic objective the same as Equation 2.3 and the 
isotropic objective being the one shown in Equation 2.5. 

Obtaining a unified algorithm for isotropic and anisotropic TV was one of the main motivations for our 
work. Our result readily applies to both situations, giving approximate algorithms that run in 
for both cases. It's worth noting that this guarantee does not rely on the underlying structure of the of 
the graph. This makes the algorithm readily applicable to 3-D images or non-local models involving the 
addition of edges across the image . However, when the neighborhoods are those of a 2-D image, a saving 
by a log ra factor can be obtained by using the optimal solver for planar systems by Koutis and Miller 
[KM07]. 

3.2 Denoising with Multiple Colors 

Most works on image denoising deals with images where each pixel is described using a single number 
corresponding to its intensity. A natural extension would be to colored images, where each pixel has a set 
of c attributes (in the RGB case, c = 3). One possible analogue of | X 2 X j | in this case would be llx, — xJlo, 
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and this modification can be incorporated by replacing a cluster involving a single edge with clusters over 
the c edges between the corresponding pixels. 

This type of approach can be viewed as an instance image reconstruction algorithms using Markov 
random fields. Instead of labeling each vertex with a single attribute, a set of c attributes are used instead 
and the correlation between vertices is represented using arbitrary PSD matrices. It's worth remarking 
that when such matrices have bounded condition number, it was shown in [KMP12] that the resulting least 
squares problem can still be solved efficiently by preconditioning with SDD matrices, yielding a similar 
overall running time. 

3.3 Poisson Image Editing 

The Poisson Image Editing method of Perez, Gangnet and Blake [PGB03] is a very popular method for 
image blending. This method aims to minimize the difference between the gradient of the image and a 
guidance field vector v. We show here that the grouped least square problem can be used for minimizing 
objectives from this framework. The objective function given in equation (6) of [PGB03] 

n y n (fp ~ fi ~ u w) 2 > with fp = fp ' Vp G dn 

/|n ( P , g )nn^0 

comprises mainly of terms of the form: 



(Xp Xq Vpq) 

This term can be rewritten as ((x p — x q ) — (v pq — 0)) 2 . So if we let s, be the vector where Sj iP = v pq 
and Si >q = 0, and L$ be the graph Laplacian for the edge connecting p and q, then the term equals to 
||x — Si||t .. The other terms on the boundary will have Xq clS Si constant, leading to terms of the form 
\\ x i,p ~ s i,pll| where Si :P = x q . Therefore the discrete Poisson problem of minimizing the sum of these 
squares is an instance of the quadratic minimization problem as described in Section 2.1. Perez et al. in 
Section 2 of their paper observed that these linear systems are sparse, symmetric and positive definite. We 
make the additional observation here that the systems involved are also symmetric diagonally dominant. 
The use of the grouped least squares framework also allows the possibility of augmenting these objectives 
with additional L\ or L 2 terms. 

3.4 Clustering 

Hocking et al. [HVBJ11] recently studied an approach for clustering points in d dimensional space. Given 
a set of points x± . . . x n € 3ft rf , one method that they proposed is the minimization of the following objective 
function: 

n 

min y^\\xj - yi\\l + XS^w^Wyi - yj\\ 2 
" : ,,v/ i=i ij 

Where Wij are weights indicating the association between items i and j. This problem can be viewed 
in the grouped least squares framework by viewing each Xi and as a list of d variables, giving that 
the \\xi — yi\\2 and \\yi — yj\\2 terms can be represented using a cluster of d edges. Hocking et al. used 
the Frank- Wolfe algorithm to minimize a relaxed form of this objective and observed fast behavior in 
practice. In the grouped least squares framework, this problem is an instance with 0(n 2 ) groups and 
0(dn 2 ) edges. Combining with the observation that the underlying quadratic optimization problems can 
be solved efficiently allows us to obtain an 1 + e approximate solution in 0(dn 8 / 3 e -8 / 3 ) time. 
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4 Previous Algorithmic Results 

Due to the importance of optimization problems motivated by LASSO there has been much work on efficient 
algorithms for them. We briefly describe some of the previous approaches for LASSO minimization below. 

4.1 Second-order Cone Programming 

To the best of our knowledge, the only algorithms that provide robust worst-case bounds for the entire class 
of grouped least squares problems are based on applying tools from convex optimization. In particular, it 
is known that interior point methods applied to these problems converge in 0(^/k) iterations with each 
iterations requiring solving a certain linear system [BV04, GY04]. Unfortunately computing these solutions 
is computationally expensive - the best previous bound for solving one of these systems is 0{m u ) where 
uj is the matrix multiplication exponent. This results in fairly large 0{m l / 2+ ^) total running time and 
contributes to the popularity of first-order methods described above in practical scenarios. We will revisit 
this approach in Subsection 5.2 and show an improved algorithm for the inner iterations. However, its 
running time still has a fairly large dependency on k. 

4.2 Graph Cuts 

For the anisotropic total variation objective shown in Equation 2.3, a minimizer can be found by solving a 
large number of almost-exact maximum flow calls [DS06, KZ04]. Although the number of iterations can be 
large, these works show that the number of problem instances that a pixel can appear in is small. Combining 
this reduction with the fastest known exact algorithm for the maximum flow problem by Goldberg and 
Rao [GR98] gives an algorithm that runs in 0(m 3//2 ) time. 

It's worth mentioning that both of these algorithms requires extracting the minimum cut in order to 
construct the problems for subsequent iterations. As a result, it's not clear whether recent advances on 
fast approximations of maximum flow and minimum s-t cuts [CKM + 11] can be used as a black box with 
these algorithms. Extending this approach to the non-linear isotropic objective also appears to be difficult. 

4.3 Iterative Reweighted Least Squares 

An approach similar to convex optimization methods, but has much better observed rates of convergence 
is the iterative reweighted least squares (IRLS) method. This method does a much more aggressive ad- 
justment each iteration and to give good performances in practice [WR07]. 

4.4 First Order Methods 

The method of choice in practice are first order methods such as [NES07, BCG11]. Theoretically these 
methods are known to converge rapidly when the objective function satisfies certain Lipschitz conditions. 
Many of the more recent works on first order methods focus on lowering the dependency of e under these 
conditions. As discussed in Section 1 and Appendix A, this direction can be considered orthogonal to our 
guarantees as the grouped least squares problem is a significantly more general formulation. 

5 Solving Grouped Least Squares Using Quadratic Minimizations 

In this section, we show two algorithms for the grouped least squares problem based on direct adaptations 
of state-of-the-art algorithms for maximum flow [CKM + 11] and minimum cost flow [DS08]. Our guarantees 
can be viewed as reductions to the quadratic minimization problems described in Section 2.1. As a result, 
they imply efficient algorithms for problems where fast algorithms are known for the corresponding least 
squares problems. The analyses of these algorithms are intricate, but mostly follow the presentations in 
[CKM+11, DS08, BV04]. They're presented in Appendices B and C. 

5.1 Approximate Algorithm 

Our first algorithm is based on the electrical flow based maximum flow and minimum cut algorithm by 
Christiano et al. [CKM + 11]. Recall that the minimum s-t cut problem - equivalent to an Li-minimization 
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problem - is a special case of the grouped least squares problem where each edge belongs to its own 
group(i.e., k = m). As a result, it's natural to extend the approach of [CKM + 11] to the whole spectrum 
of values of k by treating each group as an 'edge'. 

One view of the cut algorithm from [CKM + 11] is that it places a weight on each group, and minimizes 
a quadratic, or L\ problem involving terms of the from ^-| |x — Sj| 1^.- These weights are adjusted using the 
multiplicative weights update framework [AHK05, LW94] based on the energy of each group. The terms 
||x — Sj||i Jj and — ||x — 1 1 ^ . are equal when Wj = ||x — Sj||l 4 . Therefore, one natural view of this routine is 
that it gradually adjusts the weights to become a scaled copy of ||x — s^Il^ This leads to a simplification 
of the Christiano et al. algorithm, whose update requires a flow obtained from the dual of the quadratic 
minimization problem. Pseudocode of the algorithm is shown in Algorithm 1. 

Algorithm 1 Algorithm for the approximate decision problem of whether there exist vertex potentials 
with objective at most OPT 

ApproxGroupedLeastSquares 

Input: PSD matrices Li...Lfc, fixed values S\ . . . for each group. Routine Solve for solving linear 
systems, width parameter p and error bound e. 
Output: Vector x such that OBJ{x) < (1 + 10e)OPT. 

1: Initialize = 1 for all 1 < i < k 
2: N 10,0 log ne~ 2 
3: for t = 1 ... AT do 

4: ^^E^r' 

5: Use Solve to compute a minimizer for the quadratic minimization problem where «j = (t 1 _ 1) , let 

this solution be x^ 
6: Let AW = y/ f j,( t - 1 )OBJ2{xW) 

7: Update the weight of each group: wf } <- wf _1) + (f "^'^"^ + ^] H {t ~ 1] 
8: end for 

9: t <- argmintK^Ar OBJ(x®) 
10: return x® 



The main difficulty of analyzing this algorithm is that the analysis of minimum s-t cut algorithm of 
[CKM + 11] relies strongly on the existence of a solution where x is either or 1. Our analysis extends this 
potential function into the fractional setting via. a function based on the Kulback-Liebler (KL) divergence 
[KL51]. To our knowledge the use of this potential with multiplicative weights was first introduced by 
Freund and Schapire [FS99], and is common in learning theory. This function can be viewed as measuring 
the KL-divergence between and ||x — SjHi^ over all groups, where x an optimum solution to the 
grouped least squares problem. Formally, the KL divergence between these two vectors is: 

Dkl = E ||x - s,|| Li log (5.8) 
Subtracting the constant term given by Yli II* — ^HtC) log ( ] | x: — s,,|| (*)) and multiplying by — 1/OPT 

i i 

gives us our key potential function, i/W: 
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uit) = oPr Ell*-*>sK) (5-9) 

i 

It's worth noting that in the case of the cut algorithm, this function is identical to the potential function 
used in [CKM+11]. We show the convergence of our algorithm by proving that if the solution produced in 
some iteration is far from optimal, i/*) increases substantially. Upper bounding it with a term related to 
the sum of weights, fi® allows us to prove convergence. The full proof is given in Appendix B. 

To simplify the analysis, we assume that the guess that we're trying to solve the decision problem 
on, OPT, all entries of s, and spectrum of Lj are polynomially bounded in n. That is, there exist some 
constant d such that — n d < Sj jU < n d and n~ d I ^ Lj ^ n d I where A ^ B means B — A is PSD. Some 
of these assumptions can be relaxed via. analyses similar to Section 2 of [CKM + 11]. 

Theorem 5.1 On input of an instance of OBJ with edges partitioned into k sets. If all parameters 
polynomially bounded between n~ d and n d , running ApproxGroupedLeastSquar.es with p = 2/c 1 / 3 e~ 2//3 
returns a solution x with such that OBJ{x) < max{(l + lOe) OPT, n~ d } where OPT is the value of the 
optimum solution. 

The additive n~ d case is included to deal with the case where OPT = 0, or is close to it. We believe it 
should be also possible to handle this case via. condition numbers restrictions on J^jLj. 

5.2 Almost-Exact Algorithm 

We now show improved algorithms for solving the second order cone programming formulation given in 
[BV04, GY04]. It was shown in [DS08] that in the linear case, as with graph problems such as maximum 
flow, minimum cost flow and shortest path, interior point algorithms reduce the problem to solving 0(m 1 ^ 2 ) 
symmetrically diagonally dominant linear systems. The grouped least squares formulation creates artifacts 
that perturb the linear systems generated by the interior point algorithms, making the resulting system 
both more difficult to interpret and to solve. However, the iteration count of this approach also only 
depends on k, [GY04, BV04], and has a better dependency on e of 0(log(l/e)). 

There are various ways to solve the grouped least squares problem using interior point algorithms. We 
follow the log-barrier method, as presented in Boyd and Vandenberghe [BV04] here for simplicity. This 
formulation defines one extra variable yi for each group and enforces yi > ||x — Si\\i Ji using the barrier 
function </>j(x, yi) = \og{yf — ||x — SjIIl ). Minimizing t • (YliVi) f° r gradually increasing values of t gives 
the following sequence of functions to minimize: 

/(*, x, y) = t ]T Vi - M?/* - II* - Sil Ifj (5-10) 

i i 

Various interior point algorithms have been proposed, one commonality that they have is finding an 
update direction by solving a linear system. The iteration guarantees for recovering almost exact solution 
can be characterized as follows: 

Lemma 5.2 (Section 11.5.3 from [BV04J) A solution that's within additive e of the optimum solution can 
be produced in 0{k 1 ' 2 log(l/e)) steps, each of which requires solving a linear system involving V 2 f(t, x, y) 
for some value of t, x and y. 

These systems are examined in detail in Appendix C. Since the t yi term is linear, it can be omitted 
from the Hessian, leaving ^2 i V 2 (j)(x,yi). We then check that the barrier term yi creates a low rank 
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perturbation to the L, term, which is the Hessian for ||x — SjH^.. By taking Schur complements and 
applying the Sherman-Morrison- Woodbury identity on inverses for low rank perturbations, we arrive at 
the following observation in Appendix C: 

Theorem 5.3 Suppose there is an algorithm for solving linear systems of the form ^i&iLi in T(n,m) 
time where m. For any choice of x,y, a linear system involving the Hessian ofcp(x,y), V 2 (j)(x,y) can be 
solved in 0(k u + kT(n, m) + k 2 n) time. 

Combining this with the iteration count of 0{k 1 ^ 2 log(l/e)) gives a total running time of 0((/c w+1 / 2 + 
k 3 / 2 T(n, m) + k 5 / 2 n) log(l/e)). 

6 Experimental Results 

We performed a series of experiments using the approximate algorithm described in Section 5.1. The 
SDD linear systems that arise in the quadratic minimization problems were solved using the combinatorial 
multigrid (CMG) solver [KM09, KMT09]. One side observation from our experiments is that for the sparse 
SDD linear systems that arise from image processing, the CMG solver yields good results both in accuracy 
and running time. 

6.1 Total Variational Denoising 

Total Variational Denoising is the concept of applying Total Variational Minimization as denoising process. 
This was pioneered by Rudin, Osher and Fatemi [ROF92] and is commonly known as the ROF image model 
[CS05]. Our algorithm from Section 5.1 yields a simple way to solve the ROF model and most of its variants. 
In Figure 1, we present a simple denoising experiment using the standard image processing data set, 'Leena'. 
The main goal of the experiment is to show that our algorithm is competitive in terms of accuracy, while 
having running times comparable to first-order methods. On a 512 x 512 grayscale image, we introduce 
Additive White Gaussian Noise (AWGN) at a measured Signal to Noise Ratio (SNR) of 2. AWGN is the 
most common noise model in photon capturing sensors from consumer cameras to space satellites systems. 
Error is measured as the intensity difference from the original, uncorrupted image summed over all pixels 



Noisy Version Goldstein-Osher [GO09] Micchelli et al. [MSX11] Grouped Least Squares 




22.5E6 13.0E6 14.6E6 3.89E6 



Figure 1: Outputs of Various Algorithms for Denoising Image with AWGN Noise with Total Error Below 

Our experiments were conducted on a single core 64-bit Intel(R) Xeon(R) E5440 CPU @ 2.83GHz. The 
non-solver portion of the algorithm was implemented in Matlab(R). On images of size 256 x 256, 512 x 512 
and 1024 x 1024, the average running times are 2.31, 9.70 and 47.61 seconds respectively. 

It's worth noting is that on average 45% of the total running time is from solving the SDD linear 
systems using the CMG solver. The rest is due to reweighting edges in Matlab, and should be handled as 
part of the CMG solver routine in more optimized versions. More importantly, in all of our experiments 
the weights are observed to converge in under 15 iterations, even for larger images of size up to 3000 x 3000. 
This is much better than the guarantees given for either of our algorithms in Section 5. 
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6.2 Image Processing 

As exemplified by the denoising with colors application in Section 3.2, the grouped least squares framework 
has great flexibility in expressing image processing tasks. We applied our denoising algorithm to Optical 
Coherence Tomography (OCT) images of the retina as a preprocessing step for segmentation. Here the key 
is to preserve the sharpness between the nerve fiber layers and this is achieve by using a L\ regularization 
term. 

Variations of this formulation allows one to model a large variety of established image preprocessing 
applications. For example, Gaussian blurred images can be obtained using a L2 penalty term. This 
simulates camera zoom and is similar to the preprocessing step in the popular Scale Invariant Feature 
Transform (SIFT) algorithm. By mixing and matching penalty terms on the groups, we can preserve 
global features while favoring the removal of small artifacts that are often the result of sensor noise. Our 
approaches also extend naturally to multichannel images, (RGB or multi-spectral), with little modification 
to the underlying algorithm. 




Noisy OCT image of retina 






Yellowed newspaper scan 




Denoised image that preserved 
key features. 



Global features 



Enhanced scan 



An example of Poisson Image Editing mentioned in Section 3.3 is shown in Figure 6.2. The specific 
application is seamless cloning as described in Section 3 of [PGB03], which aims to insert complex objects 
into another image. Given two images, they are blended by solving the discrete poisson equation based 
on a mix of their gradients and boundary values. We also added L2 constraints on different parts of the 
image to give a smoother result. 
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Carnegie 

Mellon 

University 



Image of Mars from Curiosity 



CMU Logo 



Carnegie 
Mellon 
Univer 



Will it Blend? 




Figure 2: Example of Poisson Image Editing 



7 Remarks 



We believe that the ability of our algorithm to encompass many of the current image processing algorithms 
represents a major advantage in practice. It allows the use of a common data structure (the underly- 
ing graph) and subroutine (the SDD solver) for many different tasks in the image processing pipeline. 
Theoretically, this feature is also interesting as it represents a smooth interpolation between L2 and L\ 
problems. 

The performances of our algorithms depend on k, which is the number of groups in the formulation given 
in Definition 2.1. , it gives them provable runtime guarantees. Two settings of k are helpful for comparison 
to previous works. When k = 1, the problem becomes the electrical flow problem, and the running time 
of both algorithms are similar to directly solving the linear system. This is also the case when there is a 
small (constant) number of groups. The other extremum is when each edge belongs to its own group, aka. 
k = m. Here our approximate algorithm is the same as the minimum s-t cut algorithm given in [CKM + 11], 
but our analysis for our almost exact algorithm gives a much worse running time. This is due to the interior 
point algorithm generating much more complicated linear systems, and actually occur when most groups 
contain a small number of edges. As a result, finding faster algorithms with better error guarantees for 
problems with intermediate to large values of k is an interesting direction for future work. Also, preliminary 
experimental results such as the ones from Section 6 show that more aggressive reweightings of edges lead 
to much faster convergence than what we showed for our two algorithms. Therefore a suite of examples 
where the theoretical guarantees are tight would also give a better understanding of the interplay between 
multiplicative updates and quadratic minimization. 

One other consequence of this dependency on k is that although the problem with smaller number of 
groups is no longer captured by linear optimization, the minimum s-t cut problem - that still falls within 
the framework of linear optimization is in some sense the hardest problem in this class. Therefore we 
believe that the grouped least squares problem is a natural interpolation between the L\ and Li problems, 
and has potential as an intermediate subroutine in graph algorithms. 
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A Proofs about Graph Problems as Minimizing LASSO Objectives 

In this section we give formal proofs that show the shortest path problem is an instance of LASSO and 
the minimum cut problem is an instance of fused-LASSO. 

It's worth noting that our proofs do not guarantee that the answers returned are a single path or cut. 
In fact, when multiple solutions have the same value it's possible for our algorithm to return a linear 
combination of them. However, we can ensure that the optimum solution is unique using the Isolation 
Lemma of Mulmuley, Vazarani and Vazarani [MVV87] while only incurring polynomial increase in edge 
lengths/ weights. This analysis is similar to the one in Section 3.5 of [DS08] for finding a unique minimum 
cost flow, and is omitted here. 

We prove the two claims in Fact 1.1 about shortest path and minimum cut separately in Lemmas A.l 
and A. 2. 

Lemma A.l Given a s-t shortest path instance in an undirected graph where edge lengths I : E —> 
are integers between 1 and n d . There is a LASSO minimization instance where all entries are bounded by 
n O(d) suc ^ ffoaf £/j e va l ue f fag LASSO minimizer is within 1 of the optimum answer. 

Proof Our reductions rely crucially on the edge- vertex incidence matrix, which we denote using B. 
Entries of this matrix are defined as follows: 



- 1 if u is the head of e 
B e ,M = { 1 if u is the tail of e (1-11) 
otherwise 

We first show the reduction for shortest path. Then a path from s to t corresponds to a flow value 
assigned to all edges, f : E — > K such that B T f = Xs,t- If we have another flow f y corresponding to any 
path from s to t, then this constraint can be written as: 

\\B T f- Xs ,t\\2 =0 (1-12) 
||B T (f-f)|| 2 =0 

Hf-f|lBBT=0 (1-13) 

(1.14) 

The first constraint is closer to the classical LASSO problem while the last one is within our definition 
of grouped least squares problem. The length of the path can then be written as X^eM/el- Weighting 
these two terms together gives: 



min\\\f-f\\l BT +J2le\fe\ (1-15) 

e 

Where the maximum entry is bounded by max{n d , n 2 X}. Clearly its objective is less than the length of 
the shortest path, let this solution be f. Then since the total objective is at most n d+1 , we have that the 
maximum deviation between B T f and Xs.t is at most n d+1 /\. Then given a spanning tree, each of these 
deviations can be routed to s or t at a cost of at most n d+1 per unit of flow. Therefore we can obtain f y such 
that B^Y = Xst whose objective is bigger by at most n 2d+2 /X. Therefore setting A = n 2d+2 guarantees 
that our objective is within 1 of the length of the shortest path, while the maximum entry in the problem 
is bounded by n°^ d \ ■ 
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We now turn our attention to the minimum cut problem, which can be formulated as finding a vertex 
labeling ^_( vert ) where x[ mri ' = 0, x^" = 1 and the size of the cut, YIuvge l^u^ — x^" |. Since the 
L\ term in the objective can incorporate single variables, we use an additional vector ^- ed 9 e ) to indicate 
differences along edges. The minimum cut problem then becomes minimizing |x^ edfi,e ^| subject to the 
constraint that s^ ed 9 e ) = B'x^ er ^ +Bxt, where B' and -x^ vert ) are restricted to vertices other than s and t 
and Xt is the indicator vector that's 1 on t. The equality constraints can be handled similar to the shortest 
path problem by increasing the weights on the first term. One other issue is that |x^ ert ) | also appears in 
the objective term, and we handle this by scaling down y^ vert ) f or equivalently scaling up B'. 

Lemma A. 2 Given a s-t minimum cut instance in an undirected graph where edge weights w : E — > 3f? + 
are integers between 1 and n d . There is a LASSO minimization instance where all entries are bounded by 
n O{d) suc /j £/j e va i ue f fog LASSO minimizer is within 1 of the minimum cut. 

Proof 

Consider the following objective, where Ai and A2 are set to n d+3 and n 2 : 



A 1 ||A 2 B / x( 1 ' ert )' +B Xt - x( e ^ e )||2 + Ix^ 4 ')]! + |x( ed » e )|i (1.16) 

Let x be an optimum vertex labelling, then setting -^ vert ') to the restriction of n~ 2 x on vertices other 
than s and t and x( edffe ) to Bx^ er *' makes the first term 0. Since each entry of x is between and 1, the 
additive increase caused by |x^ er *)|i is at most 1/n. Therefore this objective's optimum is at most 1/n 
more than the size of the minimum cut. 

For the other direction, consider any solution x.( vert ' ,x( erffiie ) whose objective is at most 1/n more than 
the size of the minimum cut. Since the edge weights are at most n d and s has degree at most n, the total 
objective is at most n d+1 + 1. This gives: 

\\X 2 B'x.( vert Y + B Xt ~ * {ed9e) \\l <n' x (1.17) 
|| A2B / X («ert)' +Bxt _ x (ed S e)|| i < 1 1 A 2 B'x^' + B X t - x^ e ) 1 1 2 < n^ 2 (1.18) 

Therefore changing x^ e 9e > increases the objective by at most n x / 2 < 1. This 

gives a cut with weight within 1 of the objective value and completes the proof. ■ 

We can also show a more direct connection between the minimum cut problem and the fused LASSO 
objective, where each absolute value term may contain a linear combination of variables. This formulation 
is closer to the total variation objective, and is also an instance of the problem formulated in Definition 
2.1 with each edge in a group. 

Lemma A. 3 The minimum cut problem in undirected graphs can be written as an instance of the fused 
LASSO objective. 

Proof Given a graph G = (V,E) and edge weights cost, the problem can be formulated as labeling the 
vertices of V \ {s, t} in order to minimize: 



cost u „|x u -x v \+ ^2 cost sn |x u - 0| + ^2 cost ni |x n - 1| (1-19) 



E 

uv£E,u,v£V\{s,t} su£E,u£V\{s,t} ut£E,u£V\{s,t} 
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B Multiplicative Weights Based Approximate Algorithm 

In this section we show that the approximate algorithm described in Section 5.1 finds a solution close to the 
optimum. We first show that if A^ as defined on Line 6 of Algorithm 1 is an upper bound for OBJiy^ 1 . 
This is crucial in our use of it as the normalizing factor in our update step on Line 7. 

Lemma B.l In all iterations we have: 



OBJ(x^) < 

Proof By the Cauchy-Schwarz inequality we have: 



(^)-(Ew-)(E^i| X 

K? llxw -*) 2 



(i) -s,iiL 



=OBJ(x (t) ) 2 (2.20) 

Taking square roots of both sides completes the proof. ■ 
At a high level, the algorithm assigns weights Wj for each group, and iteratively reweighs them for N 
iterations. Recall that our key potential functions are fj,^' which is the sum of weights of all groups, and: 

^^O^El^-^I^Mwf) (2.21) 

i 

Where x is a solution such that OBJ{5L) = OPT. We will show that if OBJ'(x.^ t '), or in turn is 
large, then increases at a rate substantially faster than . These bounds, and the relation between 
//*•> and i/M is summarized below: 

Lemma B.2 1. 

< logOxW) (2.22) 



< fl + f(I±MA ufi-i) (2 .23) 



and 



logOuW ) < £ ^ + 2e \ + log k (2.24) 



P 

(i-l) 

3. If in iteration t, A^ > (1 + We)OPT and \\x- Si\\ Li < p^-jyX^ for all groups i, then: 
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> + e(1 + 9£) (2.25) 

P 

The relationship between the upper and lower potentials can be established using the fact that Wj is 
non- negative: 

Proof of Lemma B.2, Part 1: 

- (i) =oFfEH--^H^ 1 °g(wf ) ) 

"OPT 



io g (^ ) )^Eii--^n^) 



<log(/i (t) ) (2.26) 



Part 2 follows directly from the local behavior of the log function: 
Proof of Lemma B.2, Part 2: The update rules gives: 



M«"=E 



(*) 



(t _i) / e ||xW-s t || Li 2e^ \ (t _ 1} 
1 + L AW kp 



by update rule on Line 7 of GroupedLeastSquares 
^,- 1 , + £ E,I|xW-s,II N „- 1) + ^| >i( ,- 1) 

„_ 1)+lOSJ (x>')) „_ 1)+ ^ 

p Aw p 

< M (*-D + £ M (*-i) + Hf! M (*-i) By Lemma B.l 
P P 

1 + f£±*T\ M (*-D (2 . 27) 



Using the fact that 1 + x < exp(x) when x > we get: 
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^)< exp ('!(i±My-i) 

£(l + 2 £ )^ (0) 



< exp t a 

V P J 

Taking logs of both sides gives Equation 2.24. 

■ 

This upper bound on the value of fj, also allows us to show that the balancing rule keeps the wjs 
reasonably balanced within a factor of k of each other. The following corollary can also be obtained. 

Corollary B.3 The weights at iteration t satisfy wf' > t[i® ■ 

Proof 

The proof is by induction on t. When t = we have w- ^ = 1, p^ = k and the claim follows from 
j-k = e < 1. When t > 1, we have: 

W W > W (*-D + r^V*- 1 ) By line 7 



kp 

k kp 



' e 2e 2 ' 

> I — + - — ) By the inductive hypothesis 



>jP {t) By Lemma B.2, Part 2 (2.28) 

■ 

The proof of Part 3 is the key part of our analysis. The first order change of u^' is written as a sum of 
products of L2 norms, which we analyze via. the fact that is the solution of a linear system from the 
quadratic minimization problem. 

Proof of Lemma B.2, Part 3: 

We make use of the following known fact about the behavior of the log function around 1: 

Fact B.4 7/0 < x < e, then log(l + x) > (1 - e)x. 
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(i) " v{t ~ 1] = opt S II* " S *H L * log ( w ?0 " oFf ^ ~ Sl||Ll log H* X) ) By Equation 2 - 21 

\<i<k l<i<k 

,(*) \ 



— E 



X - Sj|| L . lo 



w 



OPT .4-^. 1 \w^ _1) 



KKfc 



1 / e ll x(4) - SillL \ 

-OPT ^ ll x — Silki log [ 1 + — T) -— jtZY)) By update rule in line 7 

\<i<k \ p w i / 

1 e ( 1 — e ) I | x (*) _ s .| L n^- 1 ) 

^OPT S II* - s ilk- L - X (t) (t-i) Since log(l + x) > (1 - e)x when < x 



l<i<fc 

Since Lj forms a P.S.D norm, by the Cauchy-Schwarz inequality we have: 



-T^Hx-SillLjIxW-SillL, (2.29) 



||x - Si|| Li ||x^ - Si|| Li >(x - s i ) T L i (x(* ) - Si ) (2.30) 

=||x« - SiWl + (x - x) T L,(x« - si) (2.31) 

Recall from Lemma 2.2 that since x^*^ is the minimizer we have 

faw^U^® ^w™* (2.32) 

^w^Ih^xW -8,5=0 (2.33) 

(x-x^f ^E w f' 1)L ^ (* W ~*) =° ( 2 - 34 ) 



Substituting this into Equation 2.29 gives: 



(2.35) 
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.(t-1) 



- pOPTAW ^w^- 1 )" 



"pOPTAW 



(A W ) 2 By definition of A^) on Line 6 



pOPTAW 

> £ ( 1 - £ )( 1 + 1Qe ) By assump tion that > (1 + 10e)OPT 

> C S±±A (2 .36) 

P 

Since the iteration count largely depends on p, it suffices to provide bounds for p over all the iterations. 
The proof makes use of the following lemma about the properties of electrical flows, which describes the 
behavior of modifying the weights of a group Si that has a large contribution to the total energy. It can 
be viewed as a multiple-edge version of Lemma 2.6 of [CKM+ll]. 

Lemma B.5 Assume that e 2 p 2 < l/10fc and e < 0.01 and let ay -1 ' be the minimizer for OBJ2(wr~ x >). 

(t-i) 



Suppose there is a group i such that \\x^ 1 ) — Si\\z i > p w ^ t _^ A®, then 

OPT2{w { $) < exp OPT2(w®) 

Proof 

We first show that group i contributes a significant portion to OBJ'2{w^ t ~ 1 \~xy~^). Squaring both 
sides of the given condition gives: 



--p 2 ^^p^OBJ2(w^,^) (2.37) 



n.2 



>^-OBJ2(w^ 1 \^ t - 1 h By Corollary B.3 (2.38) 

k 

Also, by the update rule we have > (1 + e)w,[* and > w^* for all 1 < j < k. So we have: 
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0PT2(wW) <OBJ2(w { f\x { f- 1 ^) 



=OBJ2(w®,x.( t -V) - (1 - -J—)]]^*- 1 ) 

<OTJ2(w( f ),x('- 1 ') - -llx^" 1 ) - Si ||i 

2 1 

2 2 

<OBJ2(wW,x (i " 1) ) - ^-OBJ2(w^ 




x 




<exp OB^fwM.xC- 1 )) 




(2.39) 



This means the value of the quadratic minimization problem can be used as a second potential function. 
We first show that it's monotonic and establish rough bounds for it. 

Lemma B.6 OPT2(w^) < n 3d and OPT2(wf$) is monotonically decreasing in t. 

Proof By the assumption that the input is polynomially bounded we have that all entries of s are at 
most n d and L» ^ n d I. Setting x n = gives ||x — 1 1 2 < n d+l . Combining this with the spectrum bound 
then gives ||x — Sj| < n +1 . Summing over all the groups gives the upper bound. 

The monotonicity of OPT2(w ( -^) follows from the fact that all weights are decreasing. ■ 
Combining this with the fact that OPT2(w( Ar )) is not low enough for termination gives our bound on 
the total iteration count. 

Proof of Theorem 5.1: The proof is by contradiction. Suppose otherwise, since OBJ(x^ N ^) > e we 
have: 



\W >(i + i 0e )n 
>2?i~ d OPT 



—d 



(2.40) 




(2.41) 



OPT2(w W ) > 



4 



(2.42) 



n 



2d u (N) 



Which combined with OPT2(w(°)) < n 3d from Lemma B.6 gives: 



OPT2(w(°)) 
OPT2(w( iV )) 




(2.43) 



By Lemma B.2 Part 2, we have: 
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log(M (7V) ) < 



< 



P 
P 



N + log k 



lOtiplogne + logn By choice of N = lOdplogne 



=10(1 + e)e 1 log n + log re 

<10d(l + 2e)e~ 1 log n when e < 0.01 



(2.44) 



Combining with Lemma B.5 implies that the number of iterations where ||x^ ^ — Sj||x, j > p w ^_^ A® 
for i is at most: 



log W] / ( ^ =1M(1 + S^e" 1 logn/ / 



2A- 



fcV3 



By choice of p = 2/c 1 / 3 e 2 / 3 



=8de~ 5/3 A; 1/3 log?i 
=4de _1 /)logn < eiV 



(2.45) 



Jt-i) 



This means that we have ||x^ ^ — Sj||l. < j o ^ ( \_ 1) A® for all 1 < z < k for at least (1 — e)N iterations 
and therefore by Lemma B.2 Part 3: 



U ( N) > l/ (o) + £ ( 1 + 8e ) (1 _ e)iSr>M (iv; 



(2.46) 



Giving a contradiction. ■ 
C Solving Linear Systems from Interior Point Algorithms 

In this section we take a closer look at the linear system solves required by Lemma 5.2, specifically the 
Hessians of the objective given in Equation 5.10. We show that for small values k, having access to 
an efficient subroutines for solving the quadratic minimization problems gives improvements over general 
interior-point algorithms. 

Proof of Theorem 5.3: 

We first consider the barrier function corresponding to each group, 0(x, yi). Its gradient is: 



V0(x, yi) =V - log (yf - (x - Si) T Li(x - s»)) 

L 4 (x - 



x — s. 



illLj 



(3.47) 



and its Hessian is: 



V 2 ^(x, yi ) 



x — s 



|2 \2 



^i(Vi - ll x - s i\\L) + 2 M X - s ;)( x - s i) T Li 2y i L i (x - s») 



2yj(x - Si) 1 Li 



yf + \\x-s 



illLi 



(3.48) 
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Since the variable yi only appears in 0(x, yi), we may use partial Cholesky factorization to arrive at a 
linear system without it. The n x n matrix that we obtain is: 



L^y? - II* - SillL) + ( 2 " 4 2 I ii V ' jT2— J Li(x - Si)(x - Si ) T Li 



2 

I/< + ll x " s *ll] 



-\y} - ||x - SiHfj ( U - -2— ^L,(x - si)(x - s,) T L l ) ] (3.49) 



Note that by the Cauchy-Schwarz inequality this is a PSD matrix. 

Since <fi(x., y) = ^ </>(x, j/j), the pivoted version of V 2 (/>(x, y) can be written as: 



X>L;- ftu^uf (3.50) 

Where Uj = Lj(x— Sj). To solve this linear system we invoke the Sherman- Morrison- Woodbury formula. 

Fact C.l (Sherman-Morrison-Woodbury formula) 

If A, U, C, V are nxn, nxk, kxk and k x n matrices respectively, then: 



(A + UCV)- 1 = A' 1 - A' 1 U{CT l + VA- 1 U)- 1 VA' 1 

In our case we have A = £V a^Lj, C = —I, U = [\//3iUi . . . v^fcu^] and V = U T . So the linear system 
that we need to evaluate becomes: 



A -1 - A^U^A^U - I^U^AT 1 (3.51) 

The system A _1 U can be found using k solves in A = ^^ajLj, which is equivalent to the quadratic 
minimization problem. Multiplying this by U can be done in 0(k 2 n) time and gives us U r A _1 U. This 
kxk system can in turn be solved in k^ time. The other terms can be applied to vectors in either solves 
in A or matrix multiples in U, taking 0(T(n,m) + kn) time. ■ 

D Other Variants 

Although our formulation of OBJ as a sum of Li objectives differs syntactically from some common 
formulations, we show below that the more common formulation involving a Z/2-squared fidelity term can 
be reduced to finding exact solutions to OBJ using 2 iterations of ternary search. Most other formulations 
differs from our formulation in the fidelity term, but more commonly have L\ smoothness terms as well. 
Since the anisotropic smoothness term is a special case of the isotropic one, our discussion of the variations 
will assume anisotropic objectives. 

D.l L\ loss term 

The most common form of the total variation objective used in practice is one with L\ fidelity term. This 
term can be written as ||x — sq| ||, which corresponds to the norm defined by I = Lq. This gives: 



niin ||x-s ||l + ^ ||x-Sj|| Li 

l<i<fc 
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We can establish the value of | |x — so 1 1£ separately by guessing it as a constraint. Since the t 2 is convex 
in t, the following optimization problem is convex in t as well: 

min ^ ll x - ^IIlJIx - s ||l < 

l<i<fc 

Also, due to the convexity of t 2 , ternary searching on the minimizer of this plus t 2 would allow us to 
find the optimum solution by solving O(logn) instances of the above problem. Taking square root of both 
sides of the ||x — so 1 1 l — t 2 condition and taking its Lagrangian relaxation gives: 

k 

minmaxy^ ||x - Sj|| L + A(||x - s ||l - *) 
x a>o t— 1 

- i=i 

Which by the min-max theorem is equivalent to: 



max -At + ^min^ ||x - Sj|| Li + A||x - s ||l J 

The term being minimized is identical to our formulation and its objective is convex in A when A > 0. 
Since —At is linear, their sum is convex and another ternary search on A suffices to optimize the overall 
objective. 

D.2 L\ loss term 

Another common objective function to minimize is where the fidelity term is also under L\ norm. In this 
case the objective function becomes: 

ll x - s o||i + X] Yl H x-S *lk 

i l<i<k 

This can be rewritten as a special case of OBJ as: 

u i l<i<k 

Which gives, instead, a grouped least squares problem with m + k groups. 
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